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Abstract. Fitness landscapes are genotype to fitness mappings commonly used in 
evolutionary biology and computer science which are closely related to spin glass 
models. In this paper, we study the NK model for fitness landscapes where the 
interaction scheme between genes can be explicitly defined. The focus is on how this 
scheme influences the overall shape of the landscape. Our main tool for the analysis 
are adaptive walks, an idealized dynamics by which the population moves uphill in 
fitness and terminates at a local fitness maximum. We use three different types of 
walks and investigate how their length (the number of steps required to reach a local 
peak) and height (the fitness at the endpoint of the walk) depend on the dimensionality 
and structure of the landscape. We find that the distribution of local maxima over the 
landscape is particularly sensitive to the choice of interaction pattern. Most quantities 
that we measure are simply correlated to the rank of the scheme, which is equal to 
the number of nonzero coefficients in the expansion of the fitness landscape in terms 
of Walsh functions. 
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In evolutionary biology, adaptation is the process by which the genetic structure of 
a population changes in response to its environment. This process relies on two basic 
requirements: The supply of new individuals that differ from the prevalent ones, and the 
selection of individuals that have some kind of advantage over the others. Differences 
between individuals can be ascribed to differences in their genetic blueprint, the DNA, 
that are caused, e.g., by mutation, and the advantage that is relevant for selection 
is an increased number of offspring that the better adapted individuals leave in the 
next generation. Instead of using four-lettered DNA sequences, the genotype is often 
represented as a binary sequence of length L. Its letters, usually taken to be 0 and 1, 
are then interpreted as two different alleles that can be present at a genetic locus. The 
set of genotypes has the structure of a hypercube, a graph with nodes corresponding to 
sequences and edges connecting two sequences when they differ by a point mutation in 
a single letter. Assuming that the genotype fully specifies the reproductive success of an 
individual, one may envision a mapping from the space of genotypes to the number of 
offspring or some related fitness measure. Such a mapping is called a fitness landscape 

PPP. 

Mutations modify the genotype by changing a certain letter from zero to one or 
vice versa. Whenever a new mutation arises it may become fixed, which means that 
it is carried by all individuals in the population. The chance for this event increases 
with the fitness of the new genotype compared to the average fitness of the population 
[U [5] . If the fitness decreases due to mutation, fixation can only happen by stochastic 
fluctuations [6]. As the strength of these fluctuations decreases with population size, for 
large populations a mutant can survive only if its fitness is larger than average. When 
additionally the rate of supply of new mutants is low such that the timescale of fixation 
is much smaller than the typical time between the appearance of different mutants, the 
population is monomorphic most of the time. In this regime of strong selection and weak 
mutation HUB] the dynamics can be approximated as an adaptive walk, in which the 
whole population is treated as a single entity that travels uphill in the fitness landscape 
by single mutational steps. Since the fitness has to increase in each step, these walks 
terminate when there is no neighboring genotype with larger fitness available, i.e., when 
a local fitness maximum has been reached. 

The structure of the underlying fitness landscape is crucial for population dynamics 
like adaptive walks and is often characterized in terms of its “ruggedness” which can be 
measured, for instance, by the number of local maxima PP. Though there exist 
an increasing number of empirical fitness landscapes for small sequence lengths L 
pppEnununununuis], the mechanism by which a genotype affects the fitness 

is exceedingly complex, and therefore probabilistic models are often used for theoretical 
studies. In the simplest case, the fitness values are assigned independently to each 
genotype from some probability distribution, a setting referred to as the House of Cards 
(HoC) model [16l [T7j. 
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A more sophisticated model for fitness landscapes is the NK model [IS, US] , which is 
based on the following idea: The total fitness of a genotype is the sum of several different 
contributions that are related to different properties of the individual and depend on 
different parts of the genotype. How large these parts are is controlled by the parameter 
K , while N specifies the sequence length in standard notation and hence the name of 
the model (note however that we will use L instead of N, as the latter is often reserved 
for the population size). Different parts may, and usually do, overlap and therefore one 
gene influences several contributions to the total fitness. The pattern into which the 
genotype is sectioned specifies the scheme of interaction between genes, also known as 
the genetic neighborhood. 

Fitness landscapes in general and the NK model in particular are also relevant to 
fields outside of biology. In physics, the concept of an energy landscape is very similar to 
that of a fitness landscape ra- While a population evolves into a state with high fitness, 
physical systems are driven to states of low energy. Binary sequences in particular can 
naturally be interpreted as the configuration of a system with interacting spins. In this 
context the HoC model is the analogue of Derrida’s random energy model [2T| , while the 
NK model can be interpreted as a superposition of diluted p-spin glass models [22]. In 
computer science, the NK model is used as a benchmark for optimization but especially 
as an example for an NP-complcte problem [231 [24, [25l [26 ]. 

Among the large number of studies on the NK model (e.g., pEl E3 ElEEl E9IES 
[311 32, 33] M, 35], see also section [2l2|) . only few have explicitly addressed how the choice 
of the interaction scheme affects the properties of the landscapes 30. 37]. The answer to 
this question turns out to depend strongly on the quantity under consideration. On the 
one hand, despite earlier claims to the contrary [3H], the fitness autocorrelation function 
is manifestly independent of the interaction scheme [39] 3D]. On the other hand, the 
accessibility of the global fitness maximum along paths of monotonically increasing 
fitness is highly sensitive to the structure of the genetic neighborhood [34], [36]. 

The goal of this article is to systematically study the influence of different genetic 
interaction schemes on the landscape. Our main tool for the analysis are adaptive walks, 
for two reasons. First, despite their simplicity adaptive walks represent a biologically 
relevant limit of population dynamics and are commonly used for the interpretation of 
microbial evolution experiments m- Second, and most importantly, adaptive walks 
allow for the numerical study of rather large landscapes. Keeping in mind that a 
landscape consists of 2 L genotypes, it is impossible to keep track of all of them when 
the genotype size L becomes large. Even the study of local maxima, which does not 
necessarily require the knowledge of the entire landscape, becomes infeasible quickly 
since their relative frequency decreases exponentially with L [28] 29] [30]. Adaptive 
walks, on the other hand, find local maxima rather fast, require only a tiny fraction of 
the landscape to be known and are still strongly influenced by the overall shape of the 
landscape, i.e., they conveniently translate global properties of the landscape into local 
ones. 

The article is structured as follows: In section [2] we provide the mathematical 
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framework and discuss the models for fitness landscapes and adaptive walks in more 
detail. The results can be found in section [3] which is divided into three parts. In 
section 13.11 we study a specific interaction scheme that enables us to derive several 
quantities of interest analytically, which subsequently serve as a point of comparison 
to other genetic neighborhood types. In section 13.21 we examine numerically the 
neighborhood types that are most common in the literature, and in section 13731 we discuss 
the clustering of local maxima. We then introduce the rank of an interaction scheme 
03 as a possible quantification of neighborhood types in section 13.41 and show that 
most landscape properties are correlated with it. Finally, the results are summarized 
and discussed in section EO 

2. Models and methods 

2.1. Space of genotypes and fitness landscapes 

In general, a genotype can be represented by a sequence of L letters that are drawn from 
an alphabet of a specific size. Here we will assume a binary alphabet for simplicity, i.e., 
each genotype a = (ai,..., a l) is an element of {0,1} L . Together with the Hamming 
distance 

L 

d i°, T ) =X !( 1 W 

2=1 

this can be extended to a metric space, the hypercube . The distance d(a, r) is the 
minimal number of mutations required to change the genotype from a to r (or vice 
versa). A succession 

£ = a 1 ->• a 2 . . . ->• a 11 (2) 

of genotypes is called a path, if d(a l , a* +1 ) = 1 for all i. 

In order to quantify the reproductive value of a certain genotype a a fitness value 
F(a) G M is assigned to each sequence. This mapping is called a fitness landscape. 
The fitness is a measure of how well the organism is adapted to its environment, and 
can be related to the (mean) number of offspring an individual with the corresponding 
genotype will leave in the next generation. A mutation from a to r is called beneficial 
if F(t) > F(a), and deleterious if F{r) < F(cr). Due to natural selection, only 
beneficial mutations can become prevalent in large populations. Therefore a population 
undergoing adaptation propagates through the space of genotypes along a path of 
monotonically increasing fitness. Such a path, where A(<t* +1 ) > F(a t ) for all i, is 
called accessible mmmmm- 

Commonly used probabilistic models for fitness landscapes are the House-of-Cards 
(HoC) model, the Rough-Mount-Fuji (RMF) model [42 | Hi | li5] and the NK model. Out 
of these three, the HoC model is the simplest as the fitness values F(a) are assigned 
independently to each genotype a from some probability distribution. For the HoC 
model, the number of maxima is particularly easy to calculate mh A sequence a is a 
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local maximum if and only if its fitness is larger than that of all of its neighbors, i.e., if it 
is the largest of L + 1 random variables. The probability for this is 1/(L + 1) and there 
are 2 L genotypes in the landscape, hence the expected number of maxima is 2 L /(A +1). 


2.2. The NK fitness landscapes 


The NK-model introduces correlations between the fitness values of different genotypes. 
In this model, the fitness A(c) of a sequence a is given by 

L 

F (<?) = '52fi{<?b i , i,cr bi:2 ,...,a biK ) (3) 

i— 1 

where the f, are independent HoC landscapes of size A", i.e., the ffic) are random 
numbers drawn independently from the same distribution for each i and each 
cr G {0, 1} K , such that a total of L2 h random numbers are required for generating 
one realization of the landscape. Unless mentioned otherwise, we will use /) drawn from 
a standard normal distribution throughout this article, i.e., the marginal fitness of a 
specific genotype is normally distributed with zero mean and variance L. 

The bij determine the interaction between genetic loci. For some purposes, it is 
more convenient to express the interaction matrix b tJ in terms of neighborhood sets 

V i = {bi, 1 ,bi,2,...,bi tK }- (4) 

From a biological viewpoint there are no obvious constraints on the structure of the 
interaction sets, but in the NK literature it is usually assumed that the number of sets 
is equal to the sequence length L, that all sets contain the same number K of elements, 
and that i E Vi for all i. The parameter K is interpreted as a ruggedness parameter 
and interpolates from a purely additive landscape with a single maximum (K = 1) to 
the maximally rugged HoC landscape (K = L). Note that we use K to denote the 
total number of elements in an interaction set. This is slightly different from the usual 
definition, where K is the number of elements in addition to i, and hence in our notation 
K is increased by 1 compared to the standard notation. 

The most common types of neighborhoods, which we are also going to use in this 
article, are the following (see figure [T] for illustration). 


Adjacent neighborhood: Each sub-landscape fi depends on the i-th locus and its 
K — 1 neighbors. The neighborhood sets are given by 

Vi = {i,i + 1, ... ,i + K - 1} , (5) 

each element modulo L. 

Random neighborhood: The neighborhood set V t contains i and ( K — 1) other 
numbers, which are chosen at random from {1, 2,..., L}. 

Block neighborhood: The neighborhood sets are given by 


Vi = < K 


i — 1 
K 


+ 1, AT 


i — 1 
AT 


+ 2,...,A' 


i — 1 
AT 


+ K 


( 6 ) 
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where [xj is the floor function. This means that K consecutive sets are equal, 
dividing the genotype into L/K independent blocks (L should be an integer multiple 
of K here). Each block is a HoC landscape. 
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Figure 1 . Illustration of the standard neighborhoods for L = 9 and K = 3. From 
left to right: Adjacent, block, and random neighborhood. A grey square in row i and 
column j means that Vj contains j. 


Compared to the HoC model, little is known analytically about the NK model, 
except for some special cases. One example, apart from the additive case K = 1 and 
the HoC case K = L, is the model with block neighborhood which facilitates analytical 
approaches due to its modular structure [23 |32j. As a consequence, properties like the 
number of maxima or the number of accessible paths to the global maximum can be 
derived from results for the HoC model [271 [36]. A detailed analysis of the NK model 
with adjacent and random neighborhoods was carried out by Weinberger [46], who 
derived approximate asymptotic expressions for the number and fitness values of local 
maxima as well as for the mean length of adaptive walks. In accordance with much 
of the early literature on the NK-model (e.g., [19]), Weinberger concluded that these 
quantities are the same for the adjacent and random neighborhoods or at least that 
differences between the two schemes are minor. 

A rigorous result for the adjacent neighborhood model is that the mean number of 
maxima grows asymptotically as (2 A k) L with a constant A k that increases with K and 
depends in general on the underlying fitness distribution of the sub-landscapes fi [ 29] . 
The constant is known exactly for a few distributions and specific, small values of K. For 
the exponential distribution, A 2 ~ 0.5627 and A 3 ~ 0.6114, for a gamma distribution 
with shape parameter 2 one finds A 2 ~ 0.5646 [28], and for a negative exponential 
distribution A 2 ~ 0.5770 [29]. I 11 Appendix B we present a rather straightforward 

calculation to show that 


A2 — 


3-VS+ x 6 (V3- 1 


0.5606 


for a gamma distribution with shape parameter 1/2. For large K and arbitrary 
distributions it has been conjectured that \k grows asymptotically as exp[— log (K)/K\, 
but this was proven only for Gaussian and fat-tailed distributions m- 
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Any fitness function F(a) on the hypercube H 2 can be expanded into eigenfunctions 
of the corresponding graph Laplacian [221 1351 . 57]. The resulting transformation is a 
discrete analogue of the Fourier transform [58] and also known as the Walsh transform in 
computer science [25], 59]. It takes on a particularly simple form if the binary genotypes 
are represented by sequences s G { —1,1} L which can be interpreted as configurations 
of a spin system. In this representation, the eigenfunctions of order p are proportional 
to products s^Siz-.-.Sip where 0 < p < L and the indices R,i 2 , ■■■fip are a subset of 
{1,2 ,...,L}. For the NK-model the expansion terminates at order p = K, and hence 
any NK fitness landscape can be written as 

L K 

F(s) = F 0 + 'y ] Hi Si + y ] y ] Ji x ...ipS^ • • • Si p (7) 

i =1 p=2i 1 ...ip 

where the random “magnetic fields” Hi and the “coupling constants” J ll ..... are 
determined by the original set of random functions fi. In most cases, the coefficients 
are extremely sparse as a particular coupling constant ./*, ...j is nonzero only if there is 
at least one neighborhood set V such that ij G V for all j G {1,... ,p}. The number 
of nonzero coefficients is called the rank of the landscape [37] and will be used to 
characterize different neighborhood types below in section 13.41 

The Fourier spectrum of a fitness landscape is obtained from the Fourier expansion 
by summing the squared coefficients of a given order p. Suitably normalized, this 
provides a measure for the weight of genetic interactions of different orders, and thus 
a quantification of the ruggedness of the landscape 0 [3l 09]. The Fourier spectrum 
is related to the fitness autocorrelation function through a one-dimensional linear 
transformation involving discrete orthogonal polynomials [22]. For the NK-model the 
Fourier spectrum, like the fitness autocorrelation function, is independent of the genetic 
interaction scheme and can be explicitly calculated [35]. In contrast, as we will see in 
section 13.41 the rank depends strongly on the choice of the genetic neighborhood. 

2-4- Adaptive walks 

An adaptive walk (AW) is an idealized evolutionary process. Rather than treating the 
population as a set of individuals, it behaves like a single entity that travels through the 
genotype space. Formally an AW is a Markov chain on Ho with dynamics defined by 
transition probabilities p(a —> r) for a step from genotype a to r. Such a step is allowed 
only if r can be reached from a by a single beneficial mutation, i.e., if d(a,r) = 1 and 
F(a) < F(t); otherwise p(a —> r) is zero. This means that an AW is restricted to 
moving along paths that are monotonically increasing in fitness and hence accessible in 
the sense defined above [3 uni szi [an S3]. The walk terminates on some genotype a 
if no further beneficial mutations are possible, i.e., when a is a local fitness maximum. 
The number of steps to the maximum will be called the length l of the AW and the 
fitness at the maximum will be called its height h. 
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Concerning the probabilities of allowed steps, we distinguish between different kinds 
of adaptive walks. 

Natural AW: The transition probabilities have the fitness dependent values 
, , F(r)-F(a) 

() 

where the sum runs over all fitter neighbors a' of a. 

Random AW: Each step leads to a randomly chosen fitter neighbor. 

Greedy AW: Each step leads to the fittest available neighbor. 

Reluctant AW: A step is always taken to the least fit neighbor that is still fitter than 
the current genotype. 

Note that the dynamics of greedy and reluctant walks is completely deterministic on 
a given realization of the landscape. The dynamics of the natural AW is the most 
realistic, in the sense that it can be derived from individual based population models 
like the Wright-Fisher or Moran model [3 m EU EQ. We will however not treat this 
walk type here because its dynamics is influenced by the distribution of fitness values 
[ 521 [53] . This is in contrast to the other walk types, where the behavior does not depend 
on the actual fitness values but only on their order. Greedy and random AWs can be 
interpreted as limits of more general and realistic dynamics |3], and at least for the 
HoC landscape natural AWs interpolate between them in terms of length [8]. The 
reluctant walk does not seem to have a biological interpretation and therefore has not 
been considered previously in the biological literature, but it appears in the context of 
spin glasses and optimization [55[ EH-, E3 ESJ- We will use it here as an additional tool 
for the analysis of fitness landscapes. 

A number of results are available for random and greedy AWs on the HoC 
landscape. For f > 1, the mean length of a random AW is given by ~ lnL 
pn m mi eu, while the length of the greedy AW attains a constant limiting value 
1 


HoC 


= e — i ~ 1.7183 [62] . Using the results of [61, [623, i n Appendix A.l| and 
Appendix A.2 we derive the mean value of the walk height for random and greedy AWs 
in the HoC landscape. Assuming without loss of generality that the fitness values are 
uniformly distributed on the interval [0,1], the mean height is of the form ( h ) = 1 — a/L 
to leading order, where a is a constant depending on the walk type. To our knowledge, 
no rigorous results are available for the reluctant walk, but numerically it turns out 
that its mean length is given by = L /2 and the height constant is a = 1 (see 


Appendix A.3[ ). A summary of the mean walk lengths and heights can be found in 
table [0 Since a randomly chosen maximum has an average height of 1 — 1/(L + 2), the 
fact that a = 1 for the reluctant AW implies that the maxima found by this dynamics 
are typical local maxima, whereas the random and greedy walks for which a < 1 find 
exceptionally high peaks. Moreover, on the HoC landscape the greedy AW reaches 
higher fitness levels than the random AW. 
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Table 1 . Properties of adaptive walks on the House-of-Cards landscape with uniformly 
distributed fitness values. The derivation and exact values of a can be found in 
|Appendix A| The results for the reluctant walk were obtained numerically. 


Walk type 

Length (£) 

Height (h) = 1 — a/L 

Greedy 

e — 1 

a = 0.4003 ... 

Random 

log L 

a = 0.6243 ... 

Reluctant 

L/2 

a = 1 


3. Results 

3.1. Exact results for the block model 

In the block model, each path can be decomposed into sub-paths, where each sub-path 
is confined to one specific block [27J SBJ- For example, for L = 4 and K = 2, the path 

E = (0011) -A (1011) -A (1001) -> (1000) -A (1100) 

can be decomposed into E 1 = (00) —> (10) —> (11) in the first block and E 2 = (11) —)■ 
(01) —> (00) in the second one. The first mutation occurs in block 1, the second and 
third mutation in block 2 and the fourth one again in block 1, but note that any other 
order would also lead to a valid path with the same endpoint. This means that, in order 
to construct the full path E from the E l , one also needs to know the order vr(E) in which 
the blocks are affected [36]. However, this order has no influence on the final genotype, 
the length of the path or its accessibility. 

One can easily show that the probability of an adaptive step a —>■ r in the full 
landscape, conditioned on taking place in block b , is equal to the probability of the 
corresponding step in the sub-landscape of that block. This is true under the fairly 
general condition that the transition probabilities depend only on fitness differences, 
which applies to all adaptive walk types defined in section 12.41 Hence the probability 
that a path E is taken in an adaptive walk is given by 

L/K 

P(E, C) = P(tt(E), C) JI P(E\ C ), (9) 

2=1 

where P(E, C) is the probability of taking path E in landscape C, C is the full landscape, 
CJ is the sub-landscape of block % and P(7 t(E), C) is the probability for treating the blocks 
in the specific order 7 t(E). 

As the order of blocks has no influence on the statistics we are interested in, 
namely the length and height of an adaptive walk, one can treat a walk in the full 
landscape simply as the succession of independent walks through the sub-landscapes. 
More precisely, if denotes the random variable which represents the length of the walk 
in block i, the length of the full walk is given by i = fij. In the standard block model, 
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the mean walk length is accordingly given by 

L/K 


« = £(<<> 


2—1 


K 


- 1 ) - -jriH oc(K) ■ 


( 10 ) 


For a random AW this leads for instance to ~ log K. This result was already 
obtained by Weinberger from an analysis of the density of local maxima [46], but 
appeared there as an approximation for adjacent and random neighborhoods rather 
than as an asymptotically exact statement for the block model. Interestingly, according 
to this argument the mean length of reluctant walks is given by (£) = L/2 and does not 
even depend on K. In practice, the usefulness of the relation (flOl) relies on an accurate 
knowledge of mean walk lengths on the HoC landscape. Since the analytical expressions 
in table Q] are only valid asymptotically for large K , we include a small -K correction 
to inoc(K) that consists of two additional terms proportional to 1/K and 1/A ' 2 with 
coefficients obtained by a least square fit to simulation data. 

The same argument as for the length can be used to estimate the height of an 
adaptive walk. We have 

L/K 

h = ^2 h i > ( n ) 


2=1 

where h is the height in the full landscape and h t the height in the i-th block. Since 
the hi have the same statistics as walk heights in the HoC model, one can compute the 
mean of h with the help of previous results. Using additivity of the mean value as well 
as equation (1A.30I) derived in Appendix A. 2 , we obtain 

3 - 7 ' 

( 12 ) 


W = | (ft.) « 


a e~ 


K 


where Q is the cumulative distribution function of the height within a block, 7 « 0.5772 
is the Euler-Mascheroni constant and a depends on the walk type (see tabled]). Note 
that the second approximation is only valid for fitness distributions from the Gumbel 
class of extreme value theory |63j and is applied here for a normal distribution with zero 
mean and variance K. 


3.2. Comparison of standard neighborhood types 

In the block neighborhood, both mean length (£) and height (h) of AW’s are linear in 
the sequence length L (if K is fixed), as can be trivially seen from (TTOl) and (fT 2 ]) . Strictly 
speaking, this is not true for the other neighborhood types, since the genetic sequences 
cannot be divided into independent blocks anymore. However, as shown in figure |2] for 
L /$> K the linear behavior approximately applies for all interaction schemes. When L 
is comparable to K the linear behavior only changes slightly, which leads to a linear 
regression with almost vanishing intercept. A notable exception is the reluctant walk 
on a landscape with random neighborhoods, where the intercept is negative and very 
large compared to the slope [inset of figure | 2 ]( a)]. 
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Figure 2. Adaptive walk length (a) and height (b) for the NK model with fixed K = 8 
and varying L. The symbols used for the different walk and neighborhood types are 
explained in panel (b). Lines correspond to linear regressions. 


The slope of the linear L-dependence of (£) and (h) in figure [2] differs markedly 
between different neighborhood and walk types. As in the HoC model, the greedy walk 
has the shortest length of all the walk types, the reluctant walk is the longest and random 
adaptive walks are in between. The neighborhood type has an influence on the length 
which is comparable in strength to that of the walk type. For a given walk type, random 
neighborhoods facilitate longer walks than adjacent neighborhoods, which in turn give 
rise to longer walks than the block neighborhood. The influence of the neighborhood 
on walk length is most pronounced for the reluctant walk [see the inset of figure [2(a)]. 
The ordering of neighborhood types remains the same if one looks at the walk height 
instead of length, but the situation regarding the different walk types is more complex 
[figure |2)[b)]. While for the adjacent and block neighborhood the height increases with 
the “greed” of the walk, the order is reversed for the random neighborhood. However, 
this is not the case in general but only for suitable choices of K (see also figure 0]) . 

Both lengths and heights of AW’s depend sensitively on K, with the length of 
reluctant walks in the block model being the only exception. Figure [3] shows the 
dependence on K for different choices of the neighborhood and walk type. For all values 
of K (except K = 1 and K = L) both walk length and height are consistently largest 
with random neighborhood, second largest with adjacent neighborhood and smallest 
with block neighborhood. The difference between neighborhood types is most apparent 
for intermediate values of K. This is not surprising, because for K = 1 and K = L 
all neighborhood types are equivalent and the behavior is expected to change smoothly 
with K. 

Since the local maxima of the fitness landscape are the absorbing states for adaptive 
walks, their number A max should be inversely correlated with the length of adaptive 
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Figure 3. Adaptive walk length (a)-(c) and height (d)-(f) for the NK model with fixed 
L = 256 and varying I\. Note that different scalings of the mean lengths (£) have been 
applied in panels (a)-(c) in order to emphasize the differences between neighborhood 
types. Solid lines correspond to the analytical expressions for the block neighborhood 
given by Col) and m, respectively. Symbols are explained in panel (a), and dashed 
lines are for visual guidance. 


walks. In agreement with other studies [361137] , our findings thus suggest that, for given 
values of L and K, lV max is largest in a landscape with block neighborhood, slightly 
decreased for adjacent neighborhood and the smallest for random neighborhoods. 
Moreover, assuming that the number of maxima generally increases with K , this should 
result in a decrease of (£) which can indeed be observed for greedy and random AWs 
(but note that this is not visible in figure [3] because of the scaling of (£)). However, 
reluctant walks show an unexpected departure from this pattern. The reluctant walk 
length is constant in K in the block model and displays a non-monotonic behavior for 
adjacent and random neighborhoods [figure [31(c)] . In particular, the combination of the 
reluctant walk and random neighborhoods results in extremely long walks with a length 
that is several times larger than the diameter L of the genotype space. This implies 
that on average each site in the sequence mutates several times before a local maximum 
is reached. 

Similarly, the height of adaptive walks should be related to the height of local 
maxima. In previous work it was found that the height of an average local maximum 
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K 

Figure 4. Differences in walk height for an L = 256 landscape with random 
neighborhood. Lines are for visual guidance. 


in the NK-model decreases asymptotically as ->/log(/t)/ K for large K [291 . 36]. The 
relevance of this effect in the present context should however not be overestimated, 
since the results described in section 12.41 for the HoC landscape show that adaptive 
walks generally do not terminate on random maxima but on particularly high ones. As 
more maxima become available with increasing K , the walks might find higher ones even 
though their average height decreases. Be that as it may, the resulting dependence of 
( h ) on K is not monotonic and has a maximum at rather small values of K [figures [3](d)- 
(f)]. By changing the neighborhood from block over adjacent to random, this maximum 
becomes more pronounced and is shifted slightly to larger K. 

Concerning the different walk types, the behavior of ( h ) looks qualitatively similar 
at first glance. However, when comparing the walk types on the same landscape model a 
more interesting picture emerges. In the block model greedy walks attain a larger height 
than random AW’s and reluctant walks reach the lowest heights, as would be expected 
from the results on the HoC landscape. For adjacent neighborhoods, the order of the 
heights remains the same, but the differences become smaller. However, for random 
neighborhoods, one can find values of K where this order is reversed, implying that 
the reluctant walks are most efficient in locating high fitness values (see figure [4]). This 
effect was also observed previously for similar landscape models in a different context 

[571 EH]- 

3.3. Clustering of local maxima 

In addition to the number of local fitness maxima, also their distribution in sequence 
space should be expected to affect the behavior of adaptive walks. Even on the 
uncorrelated HoC landscape, the probability that a randomly chosen genotype is a 
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local maximum is given by p max = 1/(L + 1) while the probability that two genotypes 
at distance 2 are both maxima is given by p max>2 = 1/[T (L + 1)] > p^ ax [64], i.e., in the 
proximity of a local maximum it is more probable to find another. This effect is weak on 
the HoC landscape, but on a correlated landscape the clustering of maxima can become 
quite pronounced, as has been repeatedly noted in the NK literature [HS [65]J • 

For the block model, this effect is easy to quantify. A genotype a is a fitness 
maximum, if and only if its blocks correspond to maxima in their sub-landscape. Hence 
the corresponding probability p max is given by 

Pmax = (^y) • 


Now let r be a second genotype that is randomly chosen under the constraint d(cr, r) = 2. 
In order to be a local maximum as well, the loci in which r differs from cr have to be 
within the same block, which is fulfilled with probability (K — 1)/(L — 1), and there must 
be another local maximum at this position in the block, which is true with probability 
1 / K. Therefore, the probability p maX; 2 is given by 


Pmax,2 Pmax 


K - 1 
K(L- 1) 


(13) 


which is vastly larger than for sufficiently large L. The clustering of local maxima 
can also be observed for the other neighborhood types. In figure [5][a) we display the 
distribution of distances between local maxima, showing that the clustering of maxima 
is strongest for the block neighborhood while it is weakest for the random neighborhood. 

The analysis in figure [5](a) was restricted to rather small landscapes of size L = 20 
where it is feasible to exhaustively sample all genotypes. For much larger landscapes 
this is no longer possible and it is very difficult to devise an unbiased search algorithm 
that randomly samples local maxima. For this reason, we simply consider local maxima 
that were found by an adaptive walk and determine the mean number N sm of maxima 
surrounding such a maximum, i.e., those at the minimal distance d = 2. The result 
is shown in figure [5](b). Apparently, the walk type does not have a large impact on 
A sur , but the neighborhood type does. For intermediate values of K , the number of 
surrounding maxima in the random neighborhood differs from the results for block 
neighborhoods by a factor of almost 50, while the results for the adjacent neighborhood 
lie, as always, roughly halfway between block and random neighborhood. To assess how 
strongly these results are biased by the sampling of the maxima by an AW, one may 
compare the numerical results for the block model to the corresponding prediction for 
randomly chosen local maxima derived from (fT3l) . It is seen in figure [5](b) that N sm is 
slightly larger for randomly chosen maxima, implying that the maxima found during an 
AW are more isolated than the typical ones. Nevertheless, the effect of sampling bias 
appears to be rather minor, and we conclude that the study of N sur exposes one of the 
most recognizable differences between neighborhood types. 
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Figure 5. Clustering of local maxima, (a) Distribution Q max of the distance d 
between pairs of local maxima, normalized to the corresponding distribution Q gen of 
all genotypes. Landscape parameters are L = 20 and K = 5. Inset shows Qmax/Qgen 
at distance d = 2 as a function of L for fixed L/K = 4. (b) Mean number N snl of 
local maxima that are at distance d = 2 to the final genotype of an adaptive walk on 
an L = 128 landscape. Circles correspond to greedy, squares to random and crosses 
to reluctant walks, though the walk type does not have a large impact on the result. 
Solid lines are for visual guidance. The dashed line shows N SUI according to m 
for a randomly chosen maximum in the block model. In the limiting case K = L 
corresponding to the HoC model, N SUI = {L — l)/2 independent of the interaction 
scheme. 


3.4- Neighborhood rank 


The rank r of NK-neighborhoods was introduced in [37] to quantify neighborhood 
schemes, and it was shown numerically to be negatively correlated to the number of 
maxima of a landscape if K and L are kept constant. The rank of an NK neighborhood 
scheme is defined as 


L 


r(V) 




(14) 


2—1 

where V(M ) and \M\ denote the power set and counting measure, respectively, of a 
set M. A more convenient but equivalent definition of the rank can be given in terms 
of the Fourier expansion (171) , where it is equal to the number of non-zero coupling 
constants (including the Hi and F 0 ). In this section, we will first calculate the rank for 
the classic neighborhoods of block, adjacent and random type. We will then generate 
neighborhoods of arbitrary rank that interpolate between these types and show that the 
AW-based landscape measures considered in the previous sections are correlated to the 
rank as well. 


3.4.1. Calculation of the rank 
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Figure 6. Illustration of the calculation of the rank for the adjacent neighborhood 
model. A grey square in the i-th row and j-th column means that Vi and V'. 
respectively, contains j. The left panel shows an adjacent neighborhood scheme for 
L = 6 and K = 3, and the right panel shows the extended scheme for which the rank 
is actually calculated. 


Block neighborhood: For the block model the rank is straightforward to obtain. Each 
block contains every subset of size smaller or equal to K, giving a contribution of 2 A 
to the rank. Since there are L/K blocks and the empty set is counted only once, we 
obtain 

r blc = L(2 K -l) + l. (15) 

Adjacent neighborhood: We will show that the rank of the adjacent neighborhood is 
given by 

r-adj = 1 + L ■ 2 K ~ l (16) 

for A' < (A + l)/2. For the calculation we define sets V[ which are the same as the 
standard neighborhood sets V) from ([5]) but without taking the elements modulo A, i.e., 
the V( contain elements up to A + K — 1 (see figure [6]) . Furthermore define 

M! = | M e (J V(V- ) | min(M) < A | . (17) 

It is straightforward to count the number of elements in J\A!. A set M C N is contained 
in M! if and only if 

1 < min(M) < A and max(M) — min(M) < K. (18) 

For given a = min(M) and b = max(M), there are 2 b ~ a ~ 1 possible sets. Summing over 
all a and b which fulfill (fl 8 jl leads to 

L a-\-K— 1 

\M'\ = 1 + A + 2? ’” a ~ 1 

a= 1 b=a +1 
K—2 

= 1 + A + A • ^2 2 d = 1 + A • 2 a_1 . (19) 

d =0 

We will now show that indeed \M!\ = \M.\ = r a dj for K < (A + l)/2. Note, however, 
that \M'\ > \M\ for K > (A + l )/2 and hence (fT9l) overestimates the actual rank in 
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this case. We define the function mod: Ai' —> Ai by 

mod(M') = {i mod L \ i e M'} (20) 

and show that it is bijective if K < (L + 1)/2. In fact the mod-function is always 
surjective, but only injective if K < (L + l)/2. Let M = {mi,..., m n } e Ai with 
mi < m 2 < ... < m n . Define 


M' = 


f M 


{ 


{g( m i), ■ ■ ■, g(m n )} 


m n — m i < K 
else 


( 21 ) 


where 


g(m) = 


m 


( 22 ) 


if m > K 
m + L else. 

Since M' 6 AT and mod(M') = M, the function is surjective and hence \Ai'\ > |A4|. 
Now let A = {ai,..., a n } and = {bi ,..., 6 n }, where < a/ c+1 and b & < 6fc + i for 
all /c, be two elements of AL with mod (A) = mod(il). We will show that either A = B 
or K > (L + l)/2, i.e., mod is injective if K < (L + l)/2. We denote by i and j the 
smallest indices that fulfill a* > L and bj > L, respectively, which means that 

mod(A) = {ai- L,...,a n - L,a i,..., a*_ i} 

= {bj — L,... ,b n — L,bi,..., bj_ J = mod(5). (23) 

Note that the elements of mod(A) and mod(S) are written in ascending order in (j23jl . 
Therefore, it is obvious that A — B if % — j and thus we assume without loss of generality 
that i < j. By comparison of the elements one finds that a n — L = bj-i and ai = . 

Because both A and B are elements of Ai', the conditions a n — a\ = bj_i — bj_ i+ i+L < K 
and bj-i + i — < K have to be fulfilled which finally leads to 

L + l 


K > bj-i .|_i — bj-i > L — K K > 


(24) 


Random neighborhood: Due to the random choice of neighbors, the rank is a random 
variable in this case, and we will calculate its expectation value. First the probability 
that a given set W is a subset of Vi is needed. Since V. always contains i, the probability 
depends on whether i G W or not. We find 

{K~ 1)! (. L-m)‘ 


and 


P (W c Vi | i e W) = 


P (W c V | i i W) = 


= : Pn 


(L - 1)! (.K - m / 

(K — 1)! (L- 1 -m)\ 


= Pn 


K 


m 


(25) 


(26) 


(L - 1)! (K - 1 - m)\ L-m 

where m is the number of elements of W. Hence the probability q m that W is contained 
in at least one of the neighborhood sets is given by 

q m = F(3i:W CVJ 

= P(3A e W: W C Vi) +P(3A £W:W C V { ) 

K 


= l-(l-p m ) m + 


1-1 - pn 


771 


L — m 


L—m 


(27) 
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for m > 2 and obviously q 0 = q 1 = 1 . There are such subsets W for each size m 
and hence the mean rank is given by 



(28) 


The result can be simplified by using the approximation (1 — x) n ~ 1 — nx in (l27lh which 
is valid when p m is very small. This yields q rn ~ K ■ p m and hence 



(29) 


1 + L (2 k - K) 


(30) 


T max 5 


where r max is the upper limit for the rank of a neighborhood with fixed L and K [37] . 

3-4-2. Correlation between walk properties and rank. Literally all quantities we 
analyzed in section 13.21 were either minimal or maximal for block and random 
neighborhoods, with the values for the adjacent model lying in between. It is therefore 
not surprising that the same holds true for the rank, which is minimal for block and 
maximal for random neighborhoods. This is not a coincidence, as most quantities seem 
to be generally correlated to the rank, as we are now going to show by analyzing 
neighborhoods with arbitrary rank. To generate these neighborhoods, we use the 
following algorithm: 

(i) Start with a block neighborhood. 

(ii) Choose randomly a set Vi, an element n G Vi with n i, and replace it by another 
element m ^ Vi- 

(iii) If the rank has been increased due to this operation, the change in Vi is accepted. 
Otherwise, the change is undone. 

(iv) If no rank increasing changes are found in 1000 successive trials, we start again at 
step (0. Otherwise, we continue with step (jvj). 

(v) When the rank hits a prescribed threshold, an adaptive walk is performed with the 
current neighborhood sets. 

(vi) Go to step (Oil) . 

With this method, we can produce thousands of different neighborhood schemes with 
ranks between rbi c and r max , although the maximal rank that can be achieved in this way 
is usually somewhat below r max . The results are shown in figured Clearly, all quantities 
considered in section [3721 i.e., the mean walk length (£), height (h) and the number 7V sur 
of local maxima at distance d = 2 to the final genotype of the walk, are strongly related 
to the rank and either increase or decrease monotonically with r. For both height and 
length it turns out that the different walk types react differently to variations of the 
rank, whereas IV S ur, similar to the results shown in figure [0b), is largely independent 
of the walk type. The length of greedy and random adaptive walks increases roughly 
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Neighborhood rank 



Figure 7. Correlation between the rank and several properties of adaptive walks on 
an L = 128, K = 8 landscape. The quantities shown are (a) the mean walk length (£), 
(b) the mean walk height (h), (c) the number N SUI of maxima at distance d = 2 to the 
final genotype of the walk and (d) the height advantage Ah of greedy adaptive walks. 
Lines are for visual guidance. 


linearly with rank, with a larger slope for random AW’s. Reluctant walks are more 
susceptible to alterations of the rank, the dependence of walk length on r being stronger 
than linear. 

Walk heights show a similar behavior as the length in terms of the sensitivity of 
different walk types. Reluctant walks do show a roughly linear dependence on the rank 
here, but the slope is larger than that for random and greedy walks. Since reluctant and 
random walks reach a lower height for the minimal rank ryic, their height-rank curve 
may have a point of intersection with the curve for the greedy walk. This point marks 
the threshold where reluctant and random AW’s become more successful in their ability 
to find large fitness values. Such a point exists for the landscape parameters L = 128 
and K = 8 chosen here, but not in general. As suggested by figure [2 an intermediate 
value of K compared to L is needed to observe this phenomenon, and L has to be 
sufficiently large. 
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In this paper we studied different adaptive walk models on the NK landscape with the 
focus on the differences between interaction schemes of the NK model. In section [3721 we 
analyzed three classic neighborhood types as well as three walk types, resulting in nine 
different combinations. The picture that we obtain is nevertheless rather simple: For 
each walk type, both the mean walk length {ft) and height (h) are largest for random, 
second largest for adjacent and smallest for block neighborhoods, while the order is 
reversed for the number N sur of maxima surrounding the final genotype of a walk. 
Similarly, for each neighborhood type, {£) is largest for reluctant, second largest for 
random and smallest for greedy walks. In most situations, the opposite ordering applies 
to (h), but for random neighborhoods and certain values of K and L this order can be 
reversed and reluctant walks become the most successful ones in terms of height. 

In section 13.41 we showed that this picture can be extended to more general choices 
of the neighborhood which can be classified in terms of the rank. Block, adjacent and 
random neighborhoods are just examples of schemes with low, medium and high rank, 
respectively. Our findings concerning the relation between walk length and rank are 
consistent with results from previous work, since £ would be expected to be related 
to the density of local maxima which decreases slightly with increasing rank [37] . In 
this sense, an increasing rank decreases the ruggedness of a landscape. Note that this 
is also consistent with another measure of ruggedness, namely the probability to find 
an accessible path to the global maximum, which was found to be largest for random, 
second largest for adjacent and smallest for block neighborhoods [36] . 

If the number of local maxima decreases with increasing rank, one would expect the 
same trend for the number of maxima iV sur surrounding a given maximum. Though this 
is true, the effect on N sm is much stronger than that on the walk length and the number 
of maxima, which indicates that the rank affects the distribution of maxima in the 
landscape more substantially than their density. Maxima become much more isolated 
with increasing rank. The fact that iV sur hardly depends on the walk type suggests that 
this is true for typical local maxima and not only for those found by adaptive walks. 

The rank thus appears to be a powerful tool for the characterization and description 
of neighborhood schemes, but so far it lacks explanatory power. In fact, it is quite 
surprising that the ruggedness decreases with the rank for fixed L and K , since the 
opposite is true if the neighborhood type is fixed and the rank is increased due to an 
increase of K [66] . Because of the difficulty of analytical approaches to the NK model 
for neighborhoods that are not block-like as well as the impossibility to exhaustively 
enumerate the entire landscape for large L , we are for now restricted to indirect 
measurements using adaptive walks. Nevertheless, we showed that the model is rich 
in interesting and non-trivial phenomena and hope that the dependence of landscape 
properties on interaction schemes will be investigated more frequently in future work. 

With regard to the application of probabilistic models for the interpretation of 
empirical fitness landscapes, our work highlights the importance of developing refined 
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measures of genetic interactions that go beyond the summary statistics of fitness 
landscape ruggedness considered in most previous studies Moreover, our 

demonstration that different types of adaptive walks respond differently to the structure 
of these interactions suggests a new methodology for exploring high-dimensional 
empirical data sets, where adaptive walks have so far been employed only for estimating 
the correlation length and overall density of local maxima in the landscape ra. 
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Appendix A. Adaptive walks on the HoC landscape 

Here we derive several quantities for adaptive walks on the uncorrelated House-of-Cards 
landscape. Although the properties of the walks do not depend on the underlying fitness 
distribution, for convenience the fitness values are assumed to be uniformly distributed 
on the interval [0,1]. 


Appendix A.l. Height of greedy adaptive walks 


To calculate the mean walk height for greedy adaptive walks is rather simple. If the 
walk has a length £, the population sees in total (£ + 1) ■ L genotypes and chooses the one 
with largest fitness. The mean value of the largest of n i.i.d. uniform random variables 
is given by M n — n/{n + 1), the probability that the walk has length £ is given by 
Pi = £/{£ + 1)! [62] and hence the mean walk height is given by 


(h) - ^2 M( m ). L • Pt — i— 


1=0 


1=0 


(£+!)■ L + l (£ + 1)! 




1=0 


(£ + 1) ■ L {£+l)\ 


= 1 


Q^grd 


(A.l) 

(A.2) 


where 


a g rd — /W 
1=0 


(£ +!)•(£ + !)! 


= 0.4003 


(A.3) 


Appendix A.2. Height of random adaptive walks 


The probability density QtX x ) for the height of random adaptive walks on a uniformly 
distributed HoC landscape is known |5T], so we will just compute its average. The 
density function of the height is given by 


Ql(x) = x L 1 • exp 



k 


(A.4) 
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r L (x) := 


q L (l-x/L) 




Ql( 1) 

The exponent can be written as 

(k-l)\ 


L -1 


exp 


'L -1 

E 

,fe=i 


(l-f)"-!' 


k=\ 

L—l k 


s\ ■ (k - >)!' j \l. 


EE 

k= 1 5=1 

L—l L—l 

EE 


5=1 k=s 
L—l 


(fc ~ 1)! (-!)• f£ 

s!-(fc-s)H J VL 


E 


(-l)"a; s 


(L-l)! 


T-' s ■ s! L s (L — s — 1)! 

S = 1 

The second factor is smaller than 1 and bounded from below by 1 — s 2 /L , i.e. 


L—l 

E 

5 = 1 


(■ -l) s x s 


S • s! 


'L—l 


> £(x,L) > 


, 5 = 1 


(- 1 ) S X S 


S ■ s! 


R(x, L) 


with the remainder term 


Since 


S = 1 v 7 


Mm LR(x, L) = -jr (TTa 

L ^oo z ' (S — 1)! 

s =1 v 7 


(A.5) 

(A. 6 ) 

(A.7) 

(A. 8 ) 

(A.9) 

(A. 10 ) 

(A.ll) 


= xe 


(A.12) 


is finite, R(x, L ) tends to zero for L —> oo. By the squeeze theorem it follows that 


lim £(x, L) — } 

L-> oo 


S= 1 


-l) s x s 


s ■ SI 


= -(log(x) + r( 0 ,x) + 7 ) 


(A.13) 


where 7 ~ 0.5772... is the Euler-Mascheroni constant and T(o, z) = J.°° t a ~ 1 e~ t d t is the 
incomplete gamma function. Hence the function series 77 converges for L —» 00 to a 
non-degenerate limiting function 

r oo(x) = exp (—x — log(ai) — T(0, x) — 7 )) . (A. 14) 

Since r^ does not depend on L , we can extract the L- dependence of q. With the 
substitution x = 1 — y/L we find 


(h) — / x qL^x) dx = 


Jo 

= 1 - 
= 1 - 


1-T )-Ql 1- 


d y 


Ql( 1) 
L 2 

C^rnd 


yr L (y)dy « 1 - — / yr OQ (y)dy 


(A. 15) 

(A. 16) 
(A.17) 
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with 


«rnd = / exp (—x — T(0, x )) cLr = 0.6243 ... 


This result was previously obtained in [60]. 


(A.18) 


Appendix A.3. Reluctant walks 


For the reluctant walk on the HoC landscape, we find numerically that the length is 
asymptotically given by (£) = L /2 and the height by ( h) = 1 — 1 /L (see figure [ATT) . 
These results are plausible within the Gillespie approximation m, a simplified setting 
where the entire adaptive walk proceeds among a single set of L random fitness values; 
in other words, the creation of a new neighborhood of independently drawn fitness 
values after each step is neglected. Somewhat surprisingly, the Gillespie approximation 
has been shown to correctly reproduce the leading order log L-behavior for the length 
of random and natural AW ’s 0 [52]. For the greedy AW it trivially predicts £ = 1, 
which is also rather close to the exact result (£) = e — 1 ~ 1.7183. Within the Gillespie 
approximation, a reluctant walk visits all sites of the neighborhood in order of increasing 
fitness, and the walk length is equal to the rank of the initial fitness among the other 
L fitness values in the neighborhood minus one. It follows that the length is L/2 on 
average. 

As a starting point for a systematic treatment of reluctant AW’s on the HoC 
landscape, we derive a recurrence relation for the quantity 


Pfix) := P(fitness in [x,x + dx] after l steps), (A.19) 

following the procedure of Flyvbjerg and Lautrup [ 6 TJ. The recurrence relation in general 
reads 


Pe+i(x) = 


Pi{y)i{y -t x) d y 


(A. 20 ) 


where 7 (y —> x) is the probability density of the smallest of L random variables that 
is larger than y. For uniform random variables and conditioned on there being k > 0 
random variables larger than y, the density is given by 

fc-i 


7 (y —» x | k) = 


k 


1 -y 


x-y 

1 -y 


and hence 


7 (y ->■ x) = (f) (! - y) k y L k A(y^x \ k) 

L~ 1 A / 


k =1 
L 

E 

k =1 


k —1 „ L—k 


k (1 — xf y 


= L(1 -x + y) L 1 . 

Then equation (jA.20j) becomes 

Pt+i(x) = L I P £ (y) (1 - x + y) 1 - 1 d y 


(A. 21 ) 

(A. 22 ) 

(A.23) 
(A.24) 

(A.25) 
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from which quantities of interest like walk lengths and heights could in principle be 
extracted. However, so far we have not succeeded in solving the recursion (IA.25I) . 

Appendix A.4- Walk height for fitness distributions in the Gumbel class 

So far, the calculations of the mean value of the walk height h were based on the 
assumption that the fitness values are uniformly distributed. Obviously, fitness values 
drawn from another continuous distribution can always be transformed to the uniform 
case (and vice versa), since for a random variable X with cumulative distribution 
function Q the distribution of Q(X) is uniform. Therefore, the transformed height 
Q(h) for an arbitrary distribution has the same probability density function q{x) as the 
height in the uniform case. One could in principle get to the mean height h by 

(h) = (Q _ 1 (Q(h))) = [ Q~ 1 (x)q(x)dx, (A.26) 

Jo 

but in practice this integral can be cumbersome to evaluate. However, for fitness values 
drawn from a distribution in the Gumbel class of extreme value theory, e.g., a Gaussian 
distribution, there is a simple approximation for the relation between h and Q(h). 

Let Xi,..., X n be i.i.d. random variables with cumulative distribution function 
Q = 1 — exp(— Xx) and M n = max(Ad,... ,X n ). The mean value of Q(M n ) is given by 

(' Q(M n )) = 1 - (A.27) 

71+1 

whereas the mean value of M n is given by [63] 

n i /1 \ 

A (M n ) = H n = - = log n + 7 + O ( - j (A.28) 
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Figure Al. (a) Mean walk length (£) and (b) height (h) on the House-of-Cards 
landscape. The numerical results suggest that (£) = j and (h) = 1 — f for the 
reluctant walk. 
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Figure A2. Height of random and greedy adaptive walks on the HoC landscape with 
fitness values distributed according to a standard normal distribution and a standard 
exponential distribution, respectively. Symbols correspond to simulation results, lines 
correspond to (1A.30I) . 


where H n is the n-th harmonic number. This yields 

—7 

Q((M n » « 1 - — « 1 - e-? (1 - <Q(M n )» . (A.29) 

n 

Because the Pickands-Balkema-de Haan theorem [651 681 f69j states that the tail of a 
distribution from the Gumbel class is well described by the exponential distribution, 
this approximation is also valid for more general choices of Q if n is large. We now 
assume that h behaves statistically like the maximum of n random variables, i.e., like 
M n . Although n is not necessarily known in the case of adaptive walks, we can still 
use (1A.29|) to obtain 


(h)~Q~' [1 -e _1 (l - <Q(h)»] = Q~‘ 



(A.30) 


where a is the factor depending on the walk type that was derived above. Despite the 
somewhat uncontrolled nature of the approximation, the result is quite precise as shown 
in figure IA2I 


Appendix B. Number of maxima in the adjacent NK-model with K = 2 

In the adjacent NK model with K = 2, the fitness F(a) of a genotype a G is given 
by 

L 

F (°) = ^2vi{vi,Vi+i), 


(B.l) 
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where (Jl+i — a i and the r]fia, r) are random numbers independently drawn from a 
distribution with density function / for each i, a and r, i.e., one needs 4 L random 
numbers to specify the whole model. In the following we will show that, if the random 
numbers are drawn from a Gamma distribution with shape parameter p = 1/2, the 
mean number lV max of local maxima in such a landscape is given by 

N max = (2A + ) L + (2A_) L , (B.2) 


where 



3 — V3 =t 



(B.3) 


In order to derive this result, we consider a specific genotype o which without loss 
of generality can be chosen as the all-zero genotype a = (0,... ,0), and calculate the 
probability P max that it is a local optimum. Its fitness is determined by the sum of L 
random numbers, which will be denoted by x % = 77, (0, 0) and is fixed in the following. 
If a mutation occurs at position j of the genome, the contributions Xj-± and x 3 (with 
x 0 = x L ) are replaced by two new random variables x'_ 1 and x'j, respectively. Obviously, 
if a is a local optimum, x' 3 _ l + x' rj < x 3 _\ + x 3 must hold true. Since the x % are fixed, 
this probability can be written as 

P(x'-_i + < Xj_i + xj) = F(xj -1 + xj ), (B.4) 


where 

/ x / /*oo 

( / f(z)f(y-z)dz 

-00 \J — 00 

is the cumulative distribution function of the convolution of two random variables drawn 
from /. Note that the x\ which can occur due to mutations are independent and therefore 
the probability that a is a maximum [for fixed x = (ay,..., xfi)\ is given by 


d y 


(B.5) 


-Pmax(^) = F(xi + x 2 ) F(x 2 + x 3 ) ■ ■ ■ F(x L - 1 + x L ) F(x L + Xi) . (B.6) 

The actual probability P max can then be obtained by integrating over all values of x, 
i.e., 


Prr 


L 

(j (. x n ) F (x n + x n+ i)) d L x . 

n =1 


(B.7) 


This integral can be solved exactly if the contributions are drawn from a Gamma 
distribution with shape parameter p = 1/2, i.e., the density function / is given by 


f(x) 


exp(— x) 


HTX 


(B.8) 


for x > 0 and zero otherwise. The sum of two random variables drawn from this 
distribution is exponentially distributed, i.e., 


F(x) = 1 - e~ x 


(B.9) 
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and equation (IB.71) becomes 

1 


p — 

1 max 


n 


7rL M n= 1 

Expanding the product yields 

L 


eM-Xn) ^ _ e - Xn - Xn+1 


d L x. 


np 


0 X n X n -\-\ \ _ 


)=En<-D 


&n g tE n ((T n — l^CTn) 


n= 1 (7 n= 1 

where the sum goes over all a G {0, 1} L . Inserting (IB. Ill) into (IB.lOj) gives 

„ „ l 


p — 

1 max 


7 T L JrL 


er 

a n=l 
L 


-i ) c 


_ e - x n(rTn-l+Vn + l) ^ L 


X 


ElIt- 1 )’" / 

_ JO 


exp(-x(a ri _i + cr n + 1)) 


<7 n= 1 


dx 
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(B.10) 

(B.ll) 

(B.12) 
(B.13) 
(B.14) 

_ _ . (B-15) 

s/2 V3 

Equation (IB . 14|) is of the form of a partition function of a spin chain and T is the 
corresponding transfer matrix. Using this analogue, one can write 

Pmax = Tr (T l ) = + X L _ (B.16) 

where A± are the eigenvalues of T which are given by equation (IB. 31) . The final result 
is obtained by multiplying the probability P max with the total number 2 L of genotypes 
which yields 

iV max = Pm ax • 2 L = (2A + ) l + (2A_) l . (B.17) 

For large L the behavior is governed by the larger eigenvalue A + = 0.5606.... 


En 


■i)‘ 


‘ =1 V&n -1 + &n + 1 


En^ 


^n^n+l 


(7 n =1 


with the matrix 
T = 


1 V2 

l l 
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